-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroups/systemd: add cgroup-v2 path to the list when using hybrid mode #2087
cgroups/systemd: add cgroup-v2 path to the list when using hybrid mode #2087
Conversation
Fix an issue of cgroup v2 path being incorrectly configured when using the hybrid mode in systemd. Bump docker-runc to 1.0.0-rc8-r2 as well. See also opencontainers/runc#2087
Hello folks, any thoughts about this one? Thanks! |
@cyphar |
@AkihiroSuda I changed the names according to your suggestion. |
Please squash commits and sign |
1485233
to
c33ba15
Compare
@opencontainers/runc-maintainers PTAL |
@mrunalp @kolyshkin PTAL |
Should we ever try to support hybrid mode? If yes, it should at least be tested. Currently, we barely test v2. |
FYI, I have just realized that this PR needs a rebase. |
I think we don't want to support this two-headed beast of hybrid mode, and recommend using cgroup v2 unified instead. IOW close this. @opencontainers/runc-maintainers WDYT? |
Yeah supporting hybrid mode will only lead to madness. cgroupv2 now supports all of the major controllers so hybrid mode doesn't really make sense anymore. I think we should close this. |
I agree that hybrid mode is meaningless for controllers like Anyway, we will need to close this if this remains unrebased. |
cecb69d
to
b9df855
Compare
@AkihiroSuda @kolyshkin @cyphar I rebased this. |
@kolyshkin Does this change make sense given all of the changes you made last year to how subsystem paths are handled in cgroupfs? |
@@ -116,6 +116,12 @@ func FindCgroupMountpoint(cgroupPath, subsystem string) (string, error) { | |||
return "", errUnified | |||
} | |||
|
|||
// If subsystem is empty it means that we are looking for the | |||
// cgroups2 path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say "cgroup2 hybrid path" instead.
Looks like it still makes sense. Two things
Finally, if we're doing it, we need a test case (apparently ubuntu 20.04 which is used on GHA is a good candidate for that). |
(CI failure on centos7 is unrelated; it's #2760) |
The empty string corresponds to /proc/PID/cgroup . Also, v1 named group has nothing to do with the hybrid hierarchy. |
Yes, I understand that it comes from there.
... and yet we put the path to it into a map of paths for v1 controllers. Anyway, I'm fine with the current approach -- it's a hack but at least it is documented in the code. The other two issues remain:
|
Currently the parent process of the container is moved to the right cgroup v2 tree when systemd is using a hybrid model (last line with 0::): $ runc --systemd-cgroup run myid / # cat /proc/self/cgroup 12:cpuset:/system.slice/runc-myid.scope 11:blkio:/system.slice/runc-myid.scope 10:devices:/system.slice/runc-myid.scope 9:hugetlb:/system.slice/runc-myid.scope 8:memory:/system.slice/runc-myid.scope 7:rdma:/ 6:perf_event:/system.slice/runc-myid.scope 5:net_cls,net_prio:/system.slice/runc-myid.scope 4:freezer:/system.slice/runc-myid.scope 3:pids:/system.slice/runc-myid.scope 2:cpu,cpuacct:/system.slice/runc-myid.scope 1:name=systemd:/system.slice/runc-myid.scope 0::/system.slice/runc-myid.scope However, if a second process is executed in the same container, it is not moved to the right cgroup v2 tree: $ runc exec myid /bin/sh -c 'cat /proc/self/cgroup' 12:cpuset:/system.slice/runc-myid.scope 11:blkio:/system.slice/runc-myid.scope 10:devices:/system.slice/runc-myid.scope 9:hugetlb:/system.slice/runc-myid.scope 8:memory:/system.slice/runc-myid.scope 7:rdma:/ 6:perf_event:/system.slice/runc-myid.scope 5:net_cls,net_prio:/system.slice/runc-myid.scope 4:freezer:/system.slice/runc-myid.scope 3:pids:/system.slice/runc-myid.scope 2:cpu,cpuacct:/system.slice/runc-myid.scope 1:name=systemd:/system.slice/runc-myid.scope 0::/user.slice/user-1000.slice/session-8.scope This commit makes that processes executed with exec are placed into the right cgroup v2 tree. The implementation checks if systemd is using a hybrid mode (by checking if cgroups v2 is mounted in /sys/fs/cgroup/unified), if yes, the path of the cgroup v2 slice for this container is saved into the cgroup path list. The fs group driver has a similar issue, in this case none of the runc run or runc exec commands put the process in the right cgroups v2. This commit also fixes that. Having the processes of the container in its own cgroup v2 is useful for any BPF programs that rely on bpf_get_current_cgroup_id(), like https://github.com/kinvolk/inspektor-gadget/ for instance. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
b9df855
to
bfd64a7
Compare
Check that runc run and runc exec put the process on the same cgroups v2 when using hybrid mode. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
bfd64a7
to
c64eb7f
Compare
@kolyshkin I added a test and also implemented for the fs controller. I agree that using an empty string is not the cleanest solution, I tried to implement it in another way but I hit some issues. Unfortunately I don't have that much time to continue investigating other ways to implement this. |
@mauriciovasquezbernal I am carrying this one in #3059 but this one can still be merged first (once rebased). Note that the last patch needs to put |
Well, I'm actually wrong. This needs criu-3.16 which is still not released. |
It was a very long one! Thanks a lot for driving it forward @kolyshkin! |
Currently, the parent process of the container is moved to the right cgroup-v2 tree when systemd is using a hybrid model (last line with 0::):
However, if a second process is executed in the same container, it is not moved to the right cgroup-v2 tree:
Having the processes of the container in its own cgroup-v2 is useful for any BPF programs that rely on bpf_get_current_cgroup_id(), like https://github.com/kinvolk/inspektor-gadget/ for instance.
This commit makes that processes executed with exec are placed into the right cgroup-v2 tree. The implementation checks if systemd is using a hybrid mode (by checking if cgroups-v2 is mounted in /sys/fs/cgroup/unified), if yes, the path of the cgroup-v2 slice for this container is saved into the cgroup path list.